Efficient comparative phylogenetics on large trees
نویسندگان
چکیده
Motivation Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1 491 000 metazoan and over 300 000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable. Results Here we present a new R package, named 'castor', for comparative phylogenetics on large trees comprising millions of tips. On large trees castor is often 100-1000 times faster than existing tools. Availability and implementation The castor source code, compiled binaries, documentation and usage examples are freely available at the Comprehensive R Archive Network (CRAN). Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
Methods and Architectures for Realizing Fast Phylogenetic Computation Engines Using VLSI Array Based Logic
Evaluating phylogenetics trees is an endeavor fundamental to comparative genomics and a core discipline of Bioinformatics. However, with single trees taking up to a week on the fastest processor under general models of evolution and the number of trees growing exponentially with the number of sequences analyzed, this is an exceptionally computationally intensive endeavor. There has been much wo...
متن کاملGraphical Methods for Visualizing Comparative Data on Phylogenies
Phylogenies have emerged as central in evolutionary biology over the past three decades or more, and an extraordinary expansion in the breadth and sophistication of phylogenetic comparative methods has played a large role in this growth. In this chapter, I focus on a somewhat neglected area: the use of graphical methods to simultaneously represent comparative data and trees. As this research ar...
متن کاملRESEARCH ARTICLES Confirming the Phylogeny of Mammals by Use of Large Comparative Sequence Data Sets
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computatio...
متن کاملComparative Cultural Phylogenetics and the Transmission of Belief in an Oral Society
Cultural transmission typically results in a network of connections between cultural units (such as individuals or social groups), not the branching patterns of descent seen in genetic inheritance. As a consequence, the application of phylogenetic (or evolutionary clustering) methods to cultural history faces methodological problems. Most importantly, the history of relationships inferred throu...
متن کاملA New Distance-based Approach for Phylogenetic Analysis of Protein Sequences
With the availability of ever-increasing gene and protein sequence data across a large number of species, reconstruction of phylogenetic trees to reveal the evolutionary relationship among those species becomes more and more important. In this paper, we take the physicochemical properties of amino acids into account and introduce the protein feature sequences into phylogenetic analysis by using...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 34 6 شماره
صفحات -
تاریخ انتشار 2018